智能论文笔记

MDA GAN: Adversarial-Learning-based 3-D Seismic Data Interpolation and Reconstruction for Complex Missing

Yimin Dou , Kewen Li , Hongjie Duan , Timing Li , Lin Dong , Zongchao Huang

分类：计算机视觉

2022-04-07

缺失痕迹的插值和重建是地震数据处理的关键步骤，此外，这也是一个高度不良的问题，尤其是对于复杂的情况，例如高比率随机离散丢失，连续缺失和缺失，富含断层或盐分身体调查。这些复杂的案例在当前作品中很少提及。为了应对复杂的缺失案例，我们提出了一种新型的3-D GAN框架的多维对抗GAN（MDA GAN）。它可以在3D复合物使用三个歧视器缺少重建后，保持数据的各向异性和空间连续性。该功能缝合模块的设计并嵌入到发电机中，以保留更多输入数据的信息。 TANH横熵（TCE）损失是得出的，该损失为生成器提供了最佳的重建梯度，以使生成的数据更加平滑且连续。我们通过实验验证了研究的各个组件的有效性，然后在多个可公开的数据上测试了该方法。该方法实现了多达95％的随机离散缺失和100个连续缺失的痕迹的合理重建。在断层和盐体富含调查中，MDA GAN仍然为复杂病例带来令人鼓舞的结果。在实验上，已经证明，在简单和复杂的情况下，我们的方法的性能要比其他方法更好。https：//github.com/douyimin/mda_gan

translated by 谷歌翻译

MD Loss: Efficient Training of 3D Seismic Fault Segmentation Network under Sparse Labels by Weakening Anomaly Annotation

Yimin Dou , Kewen Li , Jianbing Zhu , Timing Li , Shaoquan Tan , Zongchao Huang

分类：计算机视觉

2021-10-11

数据驱动的故障检测已被视为3D图像分割任务。从合成数据训练的模型在某些调查中很难概括。最近，使用稀疏手动2D切片的训练3D断层分割被认为会产生令人鼓舞的结果，但是手动标记具有许多假阴性标签（异常注释），这对训练有害，因此对检测性能有害。在稀疏的2D标签下训练3D断层分割网络的动机，同时抑制假阴性标签，我们分析了训练过程梯度，并提出了蒙版骰子（MD）损失。此外，故障是一个边缘功能，并且当前的编码器decoder架构广泛用于故障检测（例如，U形网络）不利于边缘表示。因此，提出了故障网络，该故障网络是为故障的特征而设计的，采用高分辨率传播特征，并嵌入多尺度压缩融合块以融合多尺度信息，从而使边缘信息在传播和融合过程中得到充分保存，从而通过几个计算资源实现高级性能。实验表明，MD损失支持将人类经验纳入训练中，并抑制其中的假阴性标签，从而使基线模型可以提高性能并推广到更多的调查。故障网络能够提供对故障的更稳定和可靠的解释，它使用极低的计算资源，并且推断的速度明显快于其他模型。我们的方法表明与几种主流方法相比，最佳性能。

translated by 谷歌翻译

Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic Knowledge

Longxu Dou , Yan Gao , Xuqi Liu , Mingyang Pan , Dingzirui Wang , Wanxiang Che , Dechen Zhan , Min-Yen Kan , Jian-Guang Lou

分类：自然语言处理

2023-01-03

In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.

translated by 谷歌翻译

Human-in-the-loop Embodied Intelligence with Interactive Simulation Environment for Surgical Robot Learning

Yonghao Long , Wang Wei , Tao Huang , Yuehao Wang , Qi Dou

分类：机器人 | 人工智能 | 计算机视觉 | 机器学习

2023-01-01

Surgical robot automation has attracted increasing research interest over the past decade, expecting its huge potential to benefit surgeons, nurses and patients. Recently, the learning paradigm of embodied AI has demonstrated promising ability to learn good control policies for various complex tasks, where embodied AI simulators play an essential role to facilitate relevant researchers. However, existing open-sourced simulators for surgical robot are still not sufficiently supporting human interactions through physical input devices, which further limits effective investigations on how human demonstrations would affect policy learning. In this paper, we study human-in-the-loop embodied intelligence with a new interactive simulation platform for surgical robot learning. Specifically, we establish our platform based on our previously released SurRoL simulator with several new features co-developed to allow high-quality human interaction via an input device. With these, we further propose to collect human demonstrations and imitate the action patterns to achieve more effective policy learning. We showcase the improvement of our simulation environment with the designed new features and tasks, and validate state-of-the-art reinforcement learning algorithms using the interactive environment. Promising results are obtained, with which we hope to pave the way for future research on surgical embodied intelligence. Our platform is released and will be continuously updated in the website: https://med-air.github.io/SurRoL/

translated by 谷歌翻译

Diffusion Model based Semi-supervised Learning on Brain Hemorrhage Images for Efficient Midline Shift Quantification

Shizhan Gong , Cheng Chen , Yuqi Gong , Nga Yan Chan , Wenao Ma , Calvin Hoi-Kwan Mak , Jill Abrigo , Qi Dou

分类：计算机视觉 | 人工智能

2023-01-01

Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptions. In this paper, we propose a novel semi-supervised framework to accurately measure the scale of MLS from head CT scans. We formulate the MLS measurement task as a deformation estimation problem and solve it using a few MLS slices with sparse labels. Meanwhile, with the help of diffusion models, we are able to use a great number of unlabeled MLS data and 2793 non-MLS cases for representation learning and regularization. The extracted representation reflects how the image is different from a non-MLS image and regularization serves an important role in the sparse-to-dense refinement of the deformation field. Our experiment on a real clinical brain hemorrhage dataset has achieved state-of-the-art performance and can generate interpretable deformation fields.

translated by 谷歌翻译

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Longxu Dou , Yan Gao , Mingyang Pan , Dingzirui Wang , Wanxiang Che , Dechen Zhan , Jian-Guang Lou

分类：自然语言处理

2022-12-27

Text-to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MultiSpider, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MultiSpider, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVe (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.

translated by 谷歌翻译

A Survey on Table-and-Text HybridQA: Concepts, Methods, Challenges and Future Directions

Dingzirui Wang , Longxu Dou , Wanxiang Che

分类：自然语言处理 | 人工智能

2022-12-27

Table-and-text hybrid question answering (HybridQA) is a widely used and challenging NLP task commonly applied in the financial and scientific domain. The early research focuses on migrating other QA task methods to HybridQA, while with further research, more and more HybridQA-specific methods have been present. With the rapid development of HybridQA, the systematic survey is still under-explored to summarize the main techniques and advance further research. So we present this work to summarize the current HybridQA benchmarks and methods, then analyze the challenges and future directions of this task. The contributions of this paper can be summarized in three folds: (1) first survey, to our best knowledge, including benchmarks, methods and challenges for HybridQA; (2) systematic investigation with the reasonable comparison of the existing systems to articulate their advantages and shortcomings; (3) detailed analysis of challenges in four important dimensions to shed light on future directions.

translated by 谷歌翻译

Generalized Decoding for Pixel, Image, and Language

Xueyan Zou , Zi-Yi Dou , Jianwei Yang , Zhe Gan , Linjie Li , Chunyuan Li , Xiyang Dai , Harkirat Behl , Jianfeng Wang , Lu Yuan

分类：计算机视觉 | 自然语言处理

2022-12-21

We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decodert takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text inputs, to decode different pixel-level and token-level outputs in the same semantic space. With such a novel design, X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks. Further, our design enables seamless interactions across tasks at different granularities and brings mutual benefits by learning a common and rich pixel-level visual-semantic understanding space, without any pseudo-labeling. After pretraining on a mixed set of a limited amount of segmentation data and millions of image-text pairs, X-Decoder exhibits strong transferability to a wide range of downstream tasks in both zero-shot and finetuning settings. Notably, it achieves (1) state-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets; (2) better or competitive finetuned performance to other generalist and specialist models on segmentation and VL tasks; and (3) flexibility for efficient finetuning and novel task composition (e.g., referring captioning and image editing). Code, demo, video, and visualization are available at https://x-decoder-vl.github.io.

translated by 谷歌翻译

Temporal Output Discrepancy for Loss Estimation-based Active Learning

Siyu Huang , Tianyang Wang , Haoyi Xiong , Bihan Wen , Jun Huan , Dejing Dou

分类：计算机视觉 | 机器学习

2022-12-20

While deep learning succeeds in a wide range of tasks, it highly depends on the massive collection of annotated data which is expensive and time-consuming. To lower the cost of data annotation, active learning has been proposed to interactively query an oracle to annotate a small proportion of informative samples in an unlabeled dataset. Inspired by the fact that the samples with higher loss are usually more informative to the model than the samples with lower loss, in this paper we present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss. The core of our approach is a measurement Temporal Output Discrepancy (TOD) that estimates the sample loss by evaluating the discrepancy of outputs given by models at different optimization steps. Our theoretical investigation shows that TOD lower-bounds the accumulated sample loss thus it can be used to select informative unlabeled samples. On basis of TOD, we further develop an effective unlabeled data sampling strategy as well as an unsupervised learning criterion for active learning. Due to the simplicity of TOD, our methods are efficient, flexible, and task-agnostic. Extensive experimental results demonstrate that our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks. In addition, we show that TOD can be utilized to select the best model of potentially the highest testing accuracy from a pool of candidate models.

translated by 谷歌翻译

ADAS: A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic Segmentation

Ben Fei , Siyuan Huang , Jiakang Yuan , Botian Shi , Bo Zhang , Tao Chen , Min Dou , Yu Qiao

分类：计算机视觉 | 机器学习

2022-12-20

State-of-the-art 3D semantic segmentation models are trained on the off-the-shelf public benchmarks, but they often face the major challenge when these well-trained models are deployed to a new domain. In this paper, we propose an Active-and-Adaptive Segmentation (ADAS) baseline to enhance the weak cross-domain generalization ability of a well-trained 3D segmentation model, and bridge the point distribution gap between domains. Specifically, before the cross-domain adaptation stage begins, ADAS performs an active sampling operation to select a maximally-informative subset from both source and target domains for effective adaptation, reducing the adaptation difficulty under 3D scenarios. Benefiting from the rise of multi-modal 2D-3D datasets, ADAS utilizes a cross-modal attention-based feature fusion module that can extract a representative pair of image features and point features to achieve a bi-directional image-point feature interaction for better safe adaptation. Experimentally, ADAS is verified to be effective in many cross-domain settings including: 1) Unsupervised Domain Adaptation (UDA), which means that all samples from target domain are unlabeled; 2) Unsupervised Few-shot Domain Adaptation (UFDA) which means that only a few unlabeled samples are available in the unlabeled target domain; 3) Active Domain Adaptation (ADA) which means that the selected target samples by ADAS are manually annotated. Their results demonstrate that ADAS achieves a significant accuracy gain by easily coupling ADAS with self-training methods or off-the-shelf UDA works.

translated by 谷歌翻译